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Abstract 

An outer bound on the low-entanglement remote state preparation (RSP) ebits vs. bits 
tradeoff curve is found using techniques of classical information theory. We show this bound 
to be optimal among an important class of protocols and conjecture optimality even without 
this restriction. 

We all know what state preparation is: Alice, having complete classical knowledge of a quantum 
state, prepares it in her lab. Remote state preparation (RSP) refers to the case where Alice, again 
having a classical description of the state, wishes to prepare a physical instance of it in Bob's 
lab, Bob being far away. It seems natural to ask about how this situation differs from quantum 
teleporation ^ where Alice has no classical knowledge of state, but has a physical instance of it. 
This was first addressed by Pati ^ and Lo [Q who considered special ensembles of states. The 
general case was investigated by Bennett et al. |l| where they posed the question of quantifying 
the resources necessary and sufficient for asymptotically perfect RSP. Asymptotic perfection means 
that the average fidelity between the resulting states in Bob's lab and the corresponding states 
Alice intended him to prepare tends to 1 as the number of states to be remotely prepared is taken 
to infinity. The resources are the same as for teleportation: entanglement (ebits) between Alice 
and Bob and classical bits of forward communication from Alice to Bob. They also allow classical 
back-communication from Bob to Alice, this extra resource being unhelpful for teleportation. For 
the case of qubit states Bennett et al. found outer bounds on the achievable (b,e) pairs by explicit 
construction of RSP protocols (see Fig.l). The teleportation point (2, 1) naturally divides the plane 
into a high and low-entanglement region where the number of ebits per remotely prepared state 
is greater than and less than 1, respectively; there is a large qualitative difference in the methods 
used for these two cases. The high-entanglement region is accessed by Alice performing certain 
generalized measurements on her ebit halves that possibly depend on her classical knowledge of 
the state, and sending classical information about the measurement results to Bob. The low- 
entanglcmcnt protocols described in (which we refer to as teleportation based) involve sending 
classical information about the states themselves causing a reduction in the posterior von Neumann 
entropy from Bob's point of view, and teleportation of Schumacher compressed states. Here we 
concentrate on the latter case, pushing these ideas to their information theoretical limit. The 
main result is an analytic expresssion for the best teleportation based outer bound on the low- 
entanglement region. Our approach borrows heavily from Shannon's classical rate-distortion theory 
Q ||], and we will emphasize the key concepts and ideas, relegating technical details to a future 
publication 
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FIG. 1 Ebits vs. bits for remote state preparation (from Q). The dotted curve represents our low- 
entanglement outer bound. The solid curve is the previous outer bound by Bennett et al. The shaded 
region is forbidden by causality. 

Let us first consider an example (attributed to H.-K. Lo |0] Q) illustrating the way classical 
information about a qubit state reduces its von Neumann entropy. It is important to appreciate 
the fact that, in the scenario we are dealing with here, the density matrix is not a property 
of the qubit, but rather reflects knowledge about the actual pure state the qubit is in. Alice 
knows her states exactly prior to remotely preparing them, hence the individual density matrices 
have zero entropy, from her point of view. At the same time Bob is completely ignorant of the 
qubit states; for all he knows Alice could have chosen them from anywhere on the Bloch sphere. 
More formally, if we denote the Bloch sphere by X, parametrized by spherical polar coordinates 
X = {9x,4'x) G [0,7r] X [0, 27r] (for convenience we will refer to the north pole 9x = as x = 0), 
then the probability density corresponding to picking x is simply p{x) = j^. The corresponding 

quantum state is |a;) = ^ i+cosB^ _|_ ^i4>^ ^Jl^c^i2\iY The resulting density matrix from Bob's 

point of view is p = j dxp{x)\x){x\ = i/, and the von Neumann entropy is S{p) = 1, as one 
would expect from such a random distribution. Now, let us assume Alice gives Bob 1 bit of 
classical information about the state, e.g., tells him whether the state is in the upper or lower 
Bloch hemisphere. The posterior distribution is now uniform in the upper (lower) hemisphere, 
i.e. p' {x) = ^ for x in the upper(lower) hemisphere and zero otherwise. The density matrix p' is 
computed as above, and the posterior von Neumann entropy becomes S{p') w 0.81 in either case. 
Schumacher's theorem Q now tells us that we have reduced the amount of quantum information 
to be conveyed to Bob, at the expense of an additional classical rate of 1 bit per letter. Based on 
this observation a protocol may be devised as follows ||l| . 

• Alice sends classical information to Bob at a rate R = 1 bit per remotely prepared state 
about which hemisphere the state lies in. 

• This reduces the von Neumann entropy of the source (as viewed by Bob) to S ~ 0.81. 
However, the density matrices now depend on the hemisphere. So Alice rotates, say, all the states 
in the lower hemisphere by a preagreed unitary transformation that maps the lower onto the upper 
hemisphere (any rotation sending the south pole to the north accomplishes this). Now the qubits 
are i.i.d. from Bob's point of view, and Schumacher's theorem applies. Alice prepares these rotated 
states, and Schumacher compresses them to S qubits per letter. 

• Alice teleports the compressed qubit states at a rate of 25 bits and S ebits per remotely 
prepared state. 

• Bob simply reverses Alice's steps in his laboratory, thus recovering asymptotically faithful 
instances of her states. 

This teleportation based protocol yields the point (25* + _R, S) in the (b,e)-plane. The property 
of being asymptotically faithful is inherited from Schumacher compression, this being based on 



2 



classical Shannon compression. It is a low-entanglement protocol since e — S < I. It is now 
evident that, if we restrict attention to teleportation based protocols, the problem reduces to 
finding the optimum rate-entropy curve, i.e. the frontier of (i?, S) pairs attainable in this way. One 
may wonder, for example, how it is possible to further reduce S while keeping R = 1. The answer 
lies in exploiting the asymptotic formulation of the problem and processing blocks of states, now 
minimizing the entropy per remotely prepared state. 

We proceed to formulate the source coding problem. The source is described by a random 
vector X ~ {Xi,X2, ■ ■ ■ ,Xn), and we take the individual Xi to be independent and identically 
distributed (i.i.d.), each taking values x on the Bloch sphere X with probability density p{x) = 
Thus the probability density distribution for X is p(x) = niP(^i)- This reflects Bob's view before 
he receives any classical information. Elements x = (cci, 0:2, ... , x„) of are called source words 
of length n, and the Xi are called letters. We map the source X onto a set B„ = {yi,y2, ■ ■ • ,y/f}, 
yfc G A:"", called a source code of size K and hlocklength n, of reproducing codewords. The rate 
of the code is formally defined as i? = log2 K ., and it signifies the number of bits per source 
letter needed to specify the index of the reproducing codeword. When Bob recieves these R bits, 
he knows the reproducing codeword, which is an approximation to the actual source word. In Lo's 
simple example n = \,K = 2,R=l and Bi consists of two codewords corresponding to the north 
pole Di and south pole 1/2, respectively. There each source word gets mapped onto the closest pole, 
and knowledge of the codeword is equivalent to specifying the hemisphere. The goal is to minimize 
the von Neumann entropy of the source word as viewed by Bob upon receiving the reproducing 
codeword. Formally, each source word x gets mapped into a unique y £ Bn in such a way that the 
posterior von Neumann entropy of the source 

S{Bn) - -E^S{E^\^\X.){^\) (1) 
n 

is minimized. Here Y is the random vector associated with the probability distribution on the 
set of codewords _B„ induced by our map. Ey denotes the expectation value over the random 
vector Y, and i?x|Y is the conditional expectation over X given the value of Y. Let us analyze 
the above expression. Let My be the set of all values of X that get mapped into Y = y. When 
Bob learns that Y = y he knows that X must have come from the set M.y. The density matrix 
he sees is now an average over all the X's from My and is denoted by the expectation value 
^x|Y=y |X)(X|. We average the corresponding von Neumann entropy over all the possible Y's 
Bob could have received, and divide by n to get a per letter result, thus giving rise to (Q). In Lo's 
example the random variable Y takes on the values yi and j/2 with probabilities ^ each, depending 
on the hemisphere of X. The distribution of X given Y is uniform over the hemisphere indicated 
by the value of Y . Thus (|l|) indeed yields the entropy obtained before. 

Formally, a rate-entropy pair (i?, S) is called (asymptotically) achievable iff there exists a se- 
quence of source codes B„ of rate R and increasing blocklength n such that 



lim S{Br,) < S (2) 

n — ^00 

We now define the rate-entropy function R{S) as the infimum of all R for which (i?, S) is 
achievable. The way such a coding problem can be solved exactly is by first finding an information- 
theoretical lower bound on R{S) and then producing a coding scheme that achieves said bound. 
Firstly, note that Y is completely determined by the corresponding value of X, and hence the 
conditional probability density (3(y|x) is a (5-function. However, for the purpose of finding a lower 
bound we relax this constraint. Secondly, observe the following string of inequalities 

i? = i log2 K > ^H{Y) > i/(X; Y) (3) 

The first inequality is saying that the entropy of Y is maximum when the codewords occur with 
equal probability in which case the entropy is simply log2 K. Intuitively, this is the number of 
bits needed to specify one oi K equiprobable codewords. The second one follows from the definition 
of mutual information /(X; Y) = H(Y) — iJ(Y|X). For the purpose of finding a lower bound, we 
consider minimizing the mutual information per letter instead of the rate, while keeping the von 
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Neumann entropy fixed. This leads to the following information-theoretical optimization problem. 
Given n and the random vector X as defined above, we wish to find 



R„iS) = - inf J(0) 

n Q(y|x):S(Q)=S 



where I{Q) is the mutual information 



and 



HQ) =JJ dxdyp(x)Q(y|x) log = JJ d^dyq{y)P{^\y) log 



S{Q) = dyq{y)S (^j dxP(x|y)|x)(x| 



(4) 



(5) 



(6) 



is the posterior von Neumann entropy, as in (|^). The probability density for the marginal Y 
distribution is given by g(y) — J (ixp(x)(5(y|x) and the conditional distribution for X given Y is 
P(x|y) = p(x)(5(y|x)/(7(y). The minimization should be done for a general length n of x. We 
have found a local extremum of this problem which we conjecture to be global, for which the 
conditional distribution factorizes, i.e. Q(y|x) = Y[i Q^iUil^i) where 



\y\x) = P'{x\y) 
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so that n = 1 suffices. Here A plays the role of a Lagrange multiplier. Some light may be shed 
on this result by noticing that there are two competing efffects. One comes from subadditivity of 
von Neumann entropy, which says that the von Neumann entropy of the whole is no greater than 
the sum of the von Neumann entropies of the parts. This favors large n in order to decrease the 
von Neumann entropy per letter. The other comes from superadditivity of mutual information, 
valid only when X is i.i.d. (as in our case) which states that the mutual information between X 
and Y is no less than the sum of the mutual informations between the corresponding components 
Xi and Yi. This favours n = 1. The latter effect apparently wins. The corresponding Ri{S) is 
parametrized as follows: 
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where the A G (0, cx)) and h2{p) — —p\og2P— (l—p) log2(l — p) is the binary Shannon entropy 
function. i?i(A) is given in nats, and should be converted into bits by dividing by log 2. The curve 
is readily found to be convex, and is shown in Fig 2. 




FIG. 2 The rate-entropy function R{S). 



So far we have only found a lower bound on R(S). Now we will demonstrate achievability, and 
thus establish that R{S) = Ri{S). It may appear that blocking was not needed after all, but this 
is due to the fact that we have not quite solved the coding problem. In particular, our solution 
Q^{y\x) is not deterministic, as a code should be, but probabilistic. Given x, y is most likely to 
be X itself, and then as the arc distance from x increases the probability decreases, reaching a 
minimum at the antipode of a;. It is only in the A ^ oo limit that Q''^{y\x) becomes a (5-function 
centered at x, which corresponds to the identity map. This also implies that the second inequality 
in (^) is not tight. However, one could expect it to become tight in the large blocklength limit, since 
H(Y) is subadditive, and /(X; Y) is superadditive. The idea is to simulate the noisy single letter 
channel defined by P^{x\y) (acting in the reverse direction, i.e. from Y to X) by the average effect 
that a deterministic coding map (from X to Y) involving large strings of letters has on the ith 
letter. To elaborate, let us assume that the ith letter in a given codeword y is some yi. Then our 
code is such that the ith components Xi of all the x's that get mapped onto y are distributed as if 
randomly chosen according to the conditional distribution P^{xi\yi). Since P^{x\y) depends only 
on the overlap {x\y), when Alice rotates x by the map that sends y to 0, the block density matrix 
Bob sees after being told the codeword is the Schumacher compression friendly tensor product of 
single qubit density matrices p' = / dxP'^{x\0)\x){x\ with entropy per qubit given by <S'(A) (^). 
The way to construct such a coding map is by using joint typicality decoding, a technique well 
known in classical rate-distortion theory It is necessary first to coarse grain X into a disjoint 
union of small near-circular caps of diameter « e and replace the probability densities P'^{x\y) 
etc. by discrete probabilities P''^{x\y) etc. where x and y belong to X, the set of cap centroids. A 
S-typical sequence x S with respect to the distribution p{x) is defined as one that satisfies 



where Af(a|x) is the number of occurences of a G in the sequence x. We call the set of 
all such typical sequences the typical set Ts{p). In words, a sequence is typical if the fraction 
of appearances of any given letter in the sequence approximates the probability for that letter. 
Another way of putting it is that picking an element of the sequence at random approximatley 
simulates the probability distribution. Note that, by the law of large numbers, a sufficiently long 
sequence chosen according to the probability distribution will "almost always" be typical. One 
similarly defines the jointly typical set Ts{Pq) of pairs of typical sequences (x,y) G {X x A")" with 
respect to the distribution P^{x\y)q{y) The coding map is as follows: 

• The codewords y are chosen at random. More precisely, each letter of each codeword is chosen 
according to q(y) (which mimics the uniform distribution). This ensures with high probability that 
the codewords will be typical of the distribution q{y). 

• Mapping a given x onto a y with the property that the pair (x, y) is typical of the joint 
distribution P^{x\y)q{y). Here x is the componentwise centroid of the cap that contains x. This 
implies that if we randomly pick a x and its corresponding y, the ith component pair will equal 
ixi,yi) with probability P^{xi\yi)q{yi). Hence, given yi, Xi was the source letter with probability 
P^{xi\yi). This is how the noisy channel P^(x|?/) is simulated. 

The above map fails when there are not enough reproducing codewords to ensure that one can 
find a member of the code i?„ jointly typical with a given x. It turns out Q that the minimal rate 
for which such an error "almost never" occurs is precisely the mutual information corresponding 
to P{x\y)q(jj), which is approximated by -Ri(A) (||). Finally, it is necessary to take the e,6 ^ 
and n — > oo limits carefully to ensure that the pair {R, S) indeed approaches the Ri{S) curve 
arbitrarily closely js). Note that joint typicality decoding is suboptimal, strictly speaking. The 
actual optimal map makes no reference to coarse graining. The code Bn is chosen at random, and 
the coding map is the one that minimizes S{Bn). However, stating it that way gives us little hope 
of computing R{S). 

Our RSP protocol is now analogous to the simple one described earlier. Alice wishes to remotely 
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prepare a string of n qubits using an (i?, S) source code. She identifies the corresponding codeword 
and rotates the original string by the map that sends the codeword to (this is analogous to 
mapping the south pole onto the north pole in Lo's example), and prepares these qubits in her 
laboratory. She may Schumacher compress them without additional blocking to Sn qubits. She 
teleports these to Bob using 2Sn classical bits and Sn ebits. A further Rn bits are sent in order to 
convey the codeword. Bob reverses Alice's steps in his laboratory, thus recovering an asymptotically 
faithful copy of the qubits to be prepared. The corresponding point in the (b,e)-plane is {R+2S, S) 
per remotely prepared state. The ebits vs. bits tradeoff curve is shown by the dotted curve in Fig 
1. and is parametrized by (-Ri(A) + 2S{X), S{X)). 

It should be noted that our protocol does not require back-communication, since it is based on 
teleportation, which enjoys the same property. We conjecture that teleportation based protocols 
are optimal among all low- entanglement protocols, and hence that our result is exact. To show this 
formally it is crucial to understand the high-entanglement region, since we expect other candidates 
to be "generated" by special points in the high-entanglement region in the same way our upper 
bound was generated by the teleportation point via R(S). 
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